53 research outputs found

    Improving pattern tracking with a language-aware tree differencing algorithm

    Get PDF
    International audienceTracking code fragments of interest is important in monitoring a software project over multiple versions. Various approaches, including our previous work on Herodotos, exploit the notion of Longest Common Subsequence, as computed by readily available tools such as GNU Diff, to map corresponding code fragments. Nevertheless, the efficient code differencing algorithms are typically line-based or word-based, and thus do not report changes at the level of language constructs. Furthermore, they identify only additions and removals, but not the moving of a block of code from one part of a file to another. Code fragments of interest that fall within the added and removed regions of code have to be manually correlated across versions, which is tedious and error-prone. When studying a very large code base over a long time, the number of manual correlations can become an obstacle to the success of a study. In this paper, we investigate the effect of replacing the current line-based algorithm used by Herodotos by tree-matching, as provided by the algorithm of the differencing tool GumTree. In contrast to the line-based approach, the tree-based approach does not generate any manual correlations, but it incurs a high execution time. To address the problem, we propose a hybrid strategy that gives the best of both approaches

    Hyperparameter Optimization for AST Differencing

    Full text link
    Computing the differences between two versions of the same program is an essential task for software development and software evolution research. AST differencing is the most advanced way of doing so, and an active research area. Yet, AST differencing still relies on default configurations or manual tweaking. In this paper we present a novel approach named DAT for hyperparameter optimization of AST differencing. We thoroughly state the problem of hyper configuration for AST differencing. We show that our data-driven approach to hyperoptimize AST differencing systems increases the edit-script quality in up to 53% of cases

    A grounded theory of Community Package Maintenance Organizations-Registered Report

    Get PDF
    International audiencea) Context: In many programming language ecosystems, developers rely more and more on external open source dependencies, made available through package managers. Key ecosystem packages that go unmaintained create a health risk for the projects that depend on them and for the ecosystem as a whole. Therefore, community initiatives can emerge to alleviate the problem by adopting packages in need of maintenance. b) Objective: The goal of our study is to explore such community initiatives, that we will designate from now on as Community Package Maintenance Organizations (CPMOs) and to build a theory of how and why they emerge, how they function and their impact on the surrounding ecosystems. c) Method: To achieve this, we plan on using a qualitative methodology called Grounded Theory. We have begun applying this methodology, by relying on "extant" documents originating from several CPMOs. We present our preliminary results and the research questions that have emerged. We plan to answer these questions by collecting appropriate data (theoretical sampling), in particular by contacting CPMO participants and questioning them by e-mails, questionnaires or semi-structured interviews. d) Impact: Our theory should inform developers willing to launch a CPMO in their own ecosystem and help current CPMO participants to better understand the state of the practice and what they could do better

    Efficient Retrieval and Ranking of Undesired Package Cycles in Large Software Systems

    Get PDF
    International audienceMany design guidelines state that a software system architecture should avoid cycles between its packages. Yet such cycles appear again and again in many programs. We believe that the existing approaches for cycle detection are too coarse to assist developers to remove cycles from their programs. In this paper, we describe an efficient algorithm that performs a fine-grained analysis of cycles among application packages. In addition, we define multiple metrics to rank cycles by their level of undesirability, prioritizing cycles that are the more undesired by developers. We compare these multiple ranking metrics on four large and mature software systems in Java and Smalltalk

    Documentation Reuse: Hot or Not? An Empirical Study

    Get PDF
    International audienceHaving available a high quality documentation is critical for software projects. This is why documentation tools such as Javadoc are so popular. As for code, documentation should be reused when possible to increase developer productivity and simplify maintenance. In this paper, we perform an empirical study of duplications in JavaDoc documentation on a corpus of seven famous Java APIs. Our results show that copy-pastes of JavaDoc documentation tags are abundant in our corpus. We also show that these copy-pastes are caused by four different kinds of relations in the underlying source code. In addition, we show that popular documentation tools do not provide any reuse mechanism to cope with these relations. Finally, we make a proposal for a simple but efficient automatic reuse mechanism

    An Empirical Assessment of Bellon's Clone Benchmark

    Get PDF
    Context: Clone benchmarks are essential to the assessment and improvement of clone detection tools and algorithms. Among existing benchmarks, Bellon’s benchmark is widely used by the research community. However, a serious threat to the validity of this benchmark is that reference clones it contains have been manually validated by Bellon alone. Other persons may disagree with Bellon’s judgment. Ob-jective: In this paper, we perform an empirical assessment of Bellon’s benchmark. Method: We seek the opinion of eighteen participants on a subset of Bellon’s benchmark to determine if researchers should trust the reference clones it contains. Results: Our experiment shows that a significant amount of the reference clones are debatable, and this phe-nomenon can introduce noise in results obtained using this benchmark

    MOON: Assisting Students in Completing Educational Notebook Scenarios

    Full text link
    Jupyter notebooks are increasingly being adopted by teachers to deliver interactive practical sessions to their students. Notebooks come with many attractive features, such as the ability to combine textual explanations, multimedia content, and executable code alongside a flexible execution model which encourages experimentation and exploration. However, this execution model can quickly become an issue when students do not follow the intended execution order of the teacher, leading to errors or misleading results that hinder their learning. To counter this adverse effect, teachers usually write detailed instructions about how students are expected to use the notebooks. Yet, the use of digital media is known to decrease reading efficiency and compliance with written instructions, resulting in frequent notebook misuse and students getting lost during practical sessions. In this article, we present a novel approach, MOON, designed to remedy this problem. The central idea is to provide teachers with a language that enables them to formalize the expected usage of their notebooks in the form of a script and to interpret this script to guide students with visual indications in real time while they interact with the notebooks. We evaluate our approach using a randomized controlled experiment involving 21 students, which shows that MOON helps students comply better with the intended scenario without hindering their ability to progress. Our follow-up user study shows that about 75% of the surveyed students perceived MOON as rather useful or very useful

    Fine-grained and Accurate Source Code Differencing

    Get PDF
    update for oadoi on Nov 02 2018International audienceAt the heart of software evolution is a sequence of edit actions, called an "edit script", made to a source code file. Since software systems are stored version by version, the edit script has to be computed from these versions, which is known as a complex task. Existing approaches usually compute edit scripts at the text granularity with only "add line" and "delete line" actions. However, inferring syntactic changes from such an edit script is hard. Since moving code is a frequent action performed when editing code and it should also be taken into account. In this paper, we tackle these issues by introducing an algorithm computing edit scripts at the abstract syntax tree granularity including move actions. Our objective is to compute edit scripts that are short and close to the original developer intent. Our algorithm is implemented in a freely-available and extensible tool that has been intensively validated
    • 

    corecore